Learning to classify e-mail

نویسندگان

  • Irena Koprinska
  • Josiah Poon
  • James Clark
  • Jason Chan
چکیده

In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large and high dimensional databases, is easy to tune and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines and naı̈ve Bayes. We introduce a new accurate feature selector with linear time complexity. Secondly, we examine the applicability of the semi-supervised co-training paradigm for spam e-mail filtering by employing random forests, support vector machines, decision tree and naı̈ve Bayes as base classifiers. The study shows that a classifier trained on a small set of labelled examples can be successfully boosted using unlabelled examples to accuracy rate of only 5% lower than a classifier trained on all labelled examples. We investigate the performance of cotraining with one natural feature split and show that in the domain of spam e-mail filtering it can be as competitive as co-training with two natural feature splits. 2006 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

e-Learning Theories with Emphasis on Independence Theory

Introduction: The basis of distance learning rests on the independence of the learner. The independent learning-teaching process is an educational system in which each learner is independent and separated from their teacher by time and place. Hence the present study seeks to examine E-learning Theories in general, but focuses on Independence Theory. Methods: The present study was conducte...

متن کامل

e-learning Utilization Based on the Problem-Solving Approach

Introduction & Objective: Paying attention to the process and approaches to the problem solving from the view of the e-learning courses designers, will improve the aspects of development. The problem-based learning provides the discovery structure and helps the students to internalize their learning. Therefore, the purpose of this study is to investigate the factors that lead to more utili...

متن کامل

E-learning Adoption by Faculty Members of Kermanshah University of Medical Sciences and Health Services: Faculties’ Viewpoints

Introduction: E-learning is an individualized education and due to its low costs, Iranian universities have suggested the use of this method. In this context, recognizing the factors affecting the adoption and use of e-learning is important. The study was performed to determine the usefulness, ease of use and perceived barriers to E-learning considering the viewpoints of faculty members of Kerm...

متن کامل

Comparing Academic Achievement in Lecture-based Learning Versus Problem-based Learning among Medical Students: A Systematic Review

Introduction: Nowadays researchers widely believe that there are differences in the effectiveness of problem-based training compared to traditional methods such as lecture. This study was done in order to compare academic achievement in lecture-based learning versus problem-based learning among medical students through a systematic review. Methods: This study is a secondary research done using...

متن کامل

 Structure Learning in Bayesian Networks Using Asexual Reproduction Optimization

A new structure learning approach for Bayesian networks (BNs) based on asexual reproduction optimization (ARO) is proposed in this letter. ARO can be essentially considered as an evolutionary based algorithm that mathematically models the budding mechanism of asexual reproduction. In ARO, a parent produces a bud through a reproduction operator; thereafter the parent and its bud compete to survi...

متن کامل

Systematic review of learning changes as technology grows

Introduction: With the advent of information and communication technology, in recent decades, a new gate opened to human beings and all its biological dimensions, and created many changes in the field of education and learning. Accordingly, the purpose of this study is to investigate how changes have been made in how learners learn from the growth and advancement of technologies. Methods:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 177  شماره 

صفحات  -

تاریخ انتشار 2007